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Abstract 

In this paper we introduce a projection method for the space of probability distribu- 
tions based on the differential geometric approach to statistics. This method is based 
on a direct metric as opposed to the usual Hehinger distance and the related Fisher 
Information metric. We explain how this apparatus can be used for the nonlinear filtering 
problem, in relationship also to earlier projection methods based on the Fisher metric. 
Past projection filters focused on the Fisher metric and the exponential families that made 
the filter correction step exact. In this work we introduce the mixture projection filter, 
namely the projection filter based on the direct LP' metric and based on a manifold given 
by a mixture of pre-assigned densities. The resulting prediction step in the filtering prob- 
lem is described by a linear differential equation, while the correction step can be made 
exact. We analyze the relationship of a specific class of Lp filters with the Galerkin based 
nonlinear filters, and highlight the differences with our approach, concerning particularly 
the continuous-time observations filtering problems. 

Keywords: Finite Dimensional Families of Probability Distributions, Exponential Families, 
Mixture Families, Hellinger distance. Fisher information metric. Direct L2 metric, Kullback 
Leibler information 
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1 Introduction 

In this paper we consider the nonlinear filtering problem in continuous time. For a quick 
introduction to the filtering problem see Davis and Marcus (1981) p!6]. For a more complete 
treatment see Liptser and Shiryayev (1978) [2^ from a mathematical point of view or Jazwinski 
(1970) [22] for a more applied perspective. For recent results see the collection of papers [T5] . 

The nonlinear filtering problem has an infinite-dimensional solution in general. Construct- 
ing of approximate finite-dimensional filters is an important area of research. 



*I am grateful towards Giuseppe Tinaglia and Alexander Pushnitski for help with geometry and topology. 
All remaining errors are my own. 
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When the system has continuous time signal and continuous time observations, the solution 
of the filtering problem is a Stochastic PDE which can be seen as a generalization of the Fokker- 
Planck equation expressing the evolution of the density of a diffusion process. This filtering 
equation is called Kushner-Stratonovich equation, and an unnormalized (simpler) version of 
it is known as the Duncan-Mortensen-Zakai Stochastic Partial Differential Equation. When 
observations are in discrete time, the filtering problem decomposes into a prediction step, given 
by the Fokker-Planck equation, and a correction step, given by Bayes formula. 

In p3], [9] and |T0] the Fisher metric is used to project the Kushner-Stratonovich (or the 
Fokker-Planck) equation onto an exponential family of probability densities, yielding the new 
class of approximate filters called projection filters. The projection filters are based on the 
differential geometric approach to statistics, as developed by [2] and [30]. It is also shown that 
one can choose the family so as to make the prediction step exact. Moreover, it is shown that 
for exponential families the projection filters coincide with the assumed density filters. 

In ^lii il2j the Gaussian projection filter is studied in the small-noise setting. 

In the present paper we choose a different differential geometric structure based on a direct 

metric as opposed to the usual Hellinger distance and the related Fisher Information metric. 
We explain how this structure can be used to derive a different family of finite dimensional 
filters that form a good approximation for the solution of the nonlinear filtering problem. This 
structure is particularly suited to be applied to mixture families of distributions, similarly to 
how exponential families are well suited to work with the Fisher information metric. In this 
work we thus introduce the mixture projection filter, namely the projection filter based on the 
direct metric and based on a manifold given by a mixture of pre-assigned densities. One key 
result we obtain is that the prediction step is given by a linear differential equation, whereas 
the correction step can be made exact by updating the basis functions for the tangent space of 
the manifold, namely the mixture components, at each observation time. 

The exponential projection filter had a clear relationship with the assumed density filters, 
as documented in pTO]. The mixture projection filter has a clear relationship with earlier 
Galerkin-based approaches to non-linear filtering, see for example [3] and p^. However, the 
geometric structure and the exact projection make the method in this paper more general, 
giving the possibility to apply it to manifolds that are more general than the standard mixture 
family. Morevoer, in the continuous time observations case, the projection filter is based on 
the Stratonovich calculus that is needed to keep the projected dynamics into the tangent space 
of the manifold, whereas the Galerkin based projection filter [3] is based on Ito calculus. We 
will explore in detail our mixture projection filter based on direct metric as compared with 
the Galerkin methods in future research, where we will also implement the mixture projection 
filter equations numerically, both for the simple mixture family and for more general families 
to which the Galerkin method cannot be applied. We will also investigate the choice of the 
specific mixture family, starting with gaussian or lognormal mixtures, with possible likelihood 
ratio corrections that make the correction step exact and allow for the definition of a rigorous 
measure for the filtering error, based on a projection residual. 

2 Statistical manifolds 

On the measurable space (R", i3(R")) we consider a non-negative and a-finite measure A, and 
we define Ai{X) to be the set of all non-negative and finite measures /i which are absolutely 
continuous w.r.t. A, and whose density 



djji 
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is positive A-a.e. For simplicity, we restrict ourselves to the case where A is the Lebesgue 
measure on R". We also assume that the total measure is normalized to one, so as to represent 
a probability measure. This in turn implies that integrates to one. 

In the following, we denote by H{X) the set of all the densities of measures contained in 
A^(A). Notice that, as all the measures in A^(A) are non-negative and finite, we have that if 
p is a density in H{\) then p E Li(\), that is y/p G L'^{\)- The above remark implies that the 
set 7?.(A) := : pEH{X)} of square roots of densities of H{X) is a subset of L'^{X). Notice 

that all y/p in 7^(A) satisfy \Jp{x) > 0, for almost every x G R". 

We notice the important point that neither H{X) nor TZ{X) are vector subspaces of Li or 
respectively. Hence, we cannot view them as normed subspaces or topological vector spaces. 

We will be able to use the norm to define a metric in TZ, but we will not be able to view 
7?. as a normed space. 

2.1 The Hellinger distance 

The above remarks lead to the definition of the following metric in 7^(A), see Jacod and 
Shiryayev [21j or Hanzon [19j, d-ji^^fpl, ^fp^) := y/j)2||, where || ■ || denotes the norm 

of the Hilbert space -^^^(A). This leads to the Hellinger metric on -ff(A) (or A1(A)), obtained 
by using the bijection between densities (or measures) and square roots of densities : if /ii 
and [i2 are the measures having densities p\ and p^ w.r.t. A, the Hellinger metric is defined 
as dj^{[i\,ii'2) = dH{pi,p2) = d-ji^y/pi, y/p2). It can be shown, see e.g. [I9j, that the distance 
c?;n(/ii, /i2) in A^(A) is defined independently of the particular A we choose as basic measure, 
as long as both /zi and /i2 are absolutely continuous w.r.t. A. As one can always find a A such 
that both Hi and /i2 are absolutely continuous w.r.t. A (take for example A := (/ii + /i2)/2), the 
distance is well defined on the set of all finite and positive measures on (f2, J-"). 

2.2 The direct distance 

There is another possibility for defining a metric in H. We consider the following subset of H\ 

H2{X) = H{X)nL\X) 

i.e. the set of densities. Notice that here we do not take the square root, but we use the 
structure directly on the densities. If we further assume that densities in H are bounded, then 

//2(A) = H{X) 

since bounded positive functions that are in Li are also in L^. 

This structure leads to the definition of the following metric in _f/'2( A): '^2(^15^2) := lbi~P2||- 
H2 with this metric is a metric space but, again, it is not a normed space, since it is not a 
vector space. We call this metric the direct distance, since it is taken directly on the densities 
rather than mapping them to their square roots. 

2.3 Neither {H{X),dH) nor (//2(A), ^2) are Hilbert manifolds 

Despite being subsets of L^, neither {H{X),dH) (or the equivalent (7l{X),dii)) nor (if2(A),c?2) 
are locally homeomorphic to L'^{X), hence they are not manifolds modeled on L'^{X). Indeed, 
any open set of L'^{X) contains functions which are negative in a set with positive A-measure. 
There is no open set of L'^{X) which contains only positive functions such as the functions of 
H2iX) or 7^(A). 
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2.4 Definition of Tangent vectors through the structure 

Consider an open subset M of -£/^(A). Let x be a point of M, and let 7 : (— e, e) — >■ M be a 
curve on M around differentiable map between an open neighborhood of G R and 

M such that 7(0) = x. We can define the tangent vector to 7 at a; as the Frechet derivative 
-^7(0) • (~^) ^) ~^ -^^(A)) i-^- the hnear map defined in R around and taking values in I/^(A) 
such that the following limit holds : 

^.^ ||7W-7(0)-^7(0)-/^ll _0_ 

\h\^Q \h\ 

The map Dj{0) approximates linearly the change of 7 around x. Let Cx{M) be the set of all 
the curves on M around x. If we consider the space 

L,M := {^7(0) : 7 e C,(M)} , 

of tangent vectors to all the possible curves on M around x, we obtain again the space L'^{X). 
This is due to the fact that for every v G L'^{X) we can always consider the straight line 
7"(/i) := X + hv. Since M is open, takes values in M for \h\ small enough. Of course 

DY{0) = V, so that indeed L^M = ^^(A). 

2.5 Finite dimensional submanifold embedded in 

The situation becomes different if we consider an m-dimensional manifold that is a subset 
of (and, possibly, a subset of TZ or H2 above). As such, it can be endowed with the topology 
induced by the norm. Because is m-dimensional, it is also locally homeomorphic to i?™. 

We can consider the induced structure on A^ as follows : suppose x & N, and define 
again 

L,N := {^7(0) : 7 e dN)} . 

This is a linear subspace of L'^{\) called the tangent vector space at x, which does not co- 
incide with L^(A) in general (due to the finite dimension of N, this tangent space will be 
m-dimensional) . The set of all tangent vectors at all points x of A?" is called the tangent bundle, 
and will be denoted by LN. In our work we shall consider finite dimensional manifolds A^ 
embedded in L'^{X), which are contained in 7^(A) or H2 as a set, i.e. N C '7?.(A) C L'^{X) or 
A^ C i?2(A) C -^^^(A), so that usually x = ^ or x = p, respectively. 
We analyze the two cases separately. 

2.6 Finite dimensional manifolds N in {IZ, d-jz) 

If A'^ is m-dimensional, it is locally homeomorphic to R™, and it may be described locally by 
a chart : if G A^, there exists a pair (5*^/^,0) with S^^^ open neighbourhood of ^yp in N 
for the topology induced by d-jz and (j) : 5"^/^ — > © homeomorphism of S^^'^ with the topology 
induced by d-R, onto an open subset © of R"* with the usual topology of R'". By considering 
the inverse map i oi (f), 

i-.e ^ 51/2 



we can express S^^"^ as 

^(e) = {VpM),^ee} = 5^/l 
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We will work only with the single coordinate chart as it is done in p]. From the 

fact that (S*^/^, 0) is a chart, it follows that 

is a set of linearly independent vectors in i^^(A). In such a context, let us see what the vectors 
of L^^y^^^S^^"^ are. We can consider a curve in 5*^^^ around \Jp{-, 0) to be of the form 7 : 



h^ypi-,0{h)), where h 6{h) is a curve in around 6. Then, according to the chain rule, 
we compute the following Frechet derivative: 



/i=0 OOk 



k=l k=l 



We obtain that a basis for the tangent vector space at \Jp{--, 0) to the space S^^"^ of square roots 
of densities of S is given by : 

As i is the inverse of a chart, these vectors are actually linearly independent, and they indeed 
form a basis of the tangent vector space. One has to be careful, because if this were not true, 
the dimension of the above spanned space could drop. 

The inner product of any two basis elements is defined, according to the inner product 

, 1 dp{-,e) 1 dp{-,e) ^ ,f i dpjx^e) dpjx^e) ^ , 



This is, up to the numeric factor i, the Fisher information metric, see for example [2], [28] 
and [1]. The matrix g{9) = {gij{9)) is called the Fisher information matrix. 

Next, we introduce the orthogonal projection between any linear subspace V of L'^{X) con- 
taining the finite dimensional tangent vector space ([T]) and the tangent vector space ([T]) itself. 
Let us remember that our basis is not orthogonal, so that we have to project according to the 
following formula: 

n -.V — > span{wi,- ■ ■ ,w„i} 

m m 

i=i j=i 

where {wi, • • • , Wm} are m linearly independent vectors, W := {{wi,Wj)) is the matrix formed 
by all the possible inner products of such linearly independent vectors, and (W^^) is the inverse 
of the matrix W. In our context {wi, ■ ■ ■ , Wm} are the vectors in ([1]), and of course W is, up to 
the numeric factor |, the Fisher information matrix given by ([2]). Then we obtain the following 
projection formula, where {g^^{9)) is the inverse of the Fisher information matrix {gij{9)) : 

n r2M^^T/_. r 1 dpi; 9) 1 dpi; 9) ^ 
He : L^iX) — > spanj — — - — , ■ ■ ■ , — — } 

(3) 



2^^M) 2y5M) 
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Let us go back to the definition of tangent vectors for our statistical manifold. Amari j2] uses 
a different representation of tangent vectors to S at p. Without exploring all the assumptions 
needed, let us say that Amari defines an isomorphism between the actual tangent space and 
the vector space 

d\ogp{-,e) d\ogp{-,e) 

On this representation of the tangent space, Amari defines a Riemannian metric given by 

d\ogp{-,e) dlogp{-,0) 

^'^■''^^^e, — ^ ' 

where Ep{-} denotes the expectation w.r.t. the probability density p. This is again the Fisher 
information metric, and indeed this is the most frequent definition of Fisher metric. In fact, it 
is easy to check that 

d\ogp{-,6) d\ogp{-,e) f d\ogp{x,9) d\ogp{x,e) 
^'^■''^^^9, 86^^ = 1 86, 86, ^(^'^)^^(^) 

(4) 

1 8p{x, 6) 8p{x, 6) 



p{x, 6) 86i 86 



dX{x) = gij{6) . 



From the above relation and from ([2]) it is clear that, up to the numeric factor i, the Fisher 
information metric and the Hellinger metric coincide on the two representations of the tangent 
space to S at p{-, 6). 

There is another way of measuring how close two densities of S are. Consider the Kullback- 
Leibler information between two densities p and q of H{X) : 



K{p, q) := / log 44 P(^) dm = ^pi^og ^} 
J q[x) q 



This is not a metric, since it is not symmetric and it does not satisfy the triangular inequality. 
When applied to a finite dimensional manifold such as S, both the KuUback-Leibler information 
and the Hellinger distance are particular cases of a-divergence, see [2] for the details. One 
can show that the Fisher metric and the KuUback-Leibler information coincide infinitesimally. 
Indeed, consider the two densities 6) and p{-, 6 + d6) of S. By expanding in Taylor series, 
we obtain 

K{p{.,6),p{-,6 + d6)) = -f2Ep^.,e){^^^^^^^}d6. 

i=l 

- E d6, d6, + Orn^) 

m 

= J2 9Md0^d6J+Oi\d6\^) . 
The interested reader is referred to 111. 



Example 2.1 (The Gaussian family and the Fisher metric with canonical parame- 
ters) . We may consider the Fisher metric for the Gaussian family of densities. The Gaussian 
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family may be defined as a particular exponential family, represented with canonical parameters 
9, given by 

{p{x, 9) = exp{9ix + 92X^ - ip{9)), ^2 < 0} 

where one has easily 

el 

and the Fisher metric is 



m 



iln' 
^ \-9, 



-1/(2^2 



9,/{29l 



9,1(291) l/(2^i) 



9y{29l 



The familiar representation of Gaussian densities is in terms of mean and variance, given 
respectively by 

/i = -^1/(2^2), v = a^ = (IM - el/9l)/2 

The Fisher metric is used ideally to compute the distance between two infinitesimally near 
points 9) and p{-,9 + d9). Informally, we can write 

dH{p{-^9),p{-,9 + d9)) = {d9fg{9)d9 

Notice that the matrix changes when changing coordinates, whereas the distance must clearly 
be the same. Hence if we have another set of coordinates rj related by diffeomorphism rj = ri{9) 
to 9, with inverse 9 = 9{r]), then clearly 

dHipi;v),p{;V + dv)) = {dvf id,0iv)f 9iO{v)) d,9{r^) dr] 
where 0,^9 [r]) is the Jacobian matrix of the transformation. It follows that 

giv) = {d,9{^)f g{9{r^)) d,9{r^) 



Example 2.2 (The Gaussian family and the Fisher metric with expectation pa- 
rameters) . We may consider the Fisher metric for the Gaussian family of densities in the 
parameters fi and v. These are related to the so called expectation parameters fi and v + fi"^. 
With this coordinate system the Fisher metric is much simpler and the matrix is diagonal, 
resulting in 



1 

l/{2v) 

This can be derived either by applying the change of coordinates formula, or Eq. [E directly, with 
the parameters 9i, 92 replaced by fi,v. 



2.7 Finite dimensional manifolds in {H2,d2) 

Alternatively, if we use H2 instead of 7^ as a set where N is contained, can still be described 
locally by a chart : if p G A^, there exists a pair [S, ip) with S open neighbourhood of p in A^ 
for the topology induced by d2 and ip : S ^ Q homeomorphism of S with the topology induced 
by c?2 onto an open subset 9 of R"^ with the usual topology. 
By considering the inverse map j of ip, 



j:Q S 

9 ^ p{;9) 
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we can express S as 

j(e) = {pi;9),9ee} = s. 

We will work only with the single coordinate chart {S,ip). From the fact that {S,ip) is a 



chart, it follows that 



{ 



dji;9) dj{;9) 



} 



d9, ' ' d9,^ 

is a set of linearly independent vectors in L'^{X). In such a context, let us see what the vectors 
of Lp(^.^0)S are. We can consider a curve in 5* around 9) to be of the form 7 : h\-^p{-, 9{h)), 
where h t— ?■ 9{h) is a curve in around 9. Then, according to the chain rule, we compute the 
following Frechet derivative: 



Dj{0) = Dp{;9{h))\ 



h=0 



t ?Ek^ 4(0) ^ t m 0.(0) . 



09, 



k=l '''^k k=l 

We obtain that a basis for the tangent vector space at 9) to the space S is given by : 



Lp{.,e)S = span{ 



dpi; 9) dpi; 9) 
d9i ' ' d9m 



} 



(5) 



As j is the inverse of a chart, these vectors are actually linearly independent, and they indeed 
form a basis of the tangent vector space. One has to be careful, because if this were not true, 
the dimension of the above spanned space could drop. 

The inner product of any two basis elements is defined, according to the inner product 



,dpi-,0) dpi-, 9) 



d9,. 



09, 



) 



dpix, 9) dpix, 9) 



39., 



89, 



dXix) = h,ji9) . 



(6) 



This is different from the Fisher information metric. The matrix hi9) = ihiji9)) is called the 
direct metric. 

Next, we introduce the orthogonal projection between any linear subspace V of L'^iX) con- 
taining the finite dimensional tangent vector space ([5]) and the tangent vector space ([5]) itself. 



He : L2(A) 3 1/ ^ span{ 



dpi; 9) dpi-, 9) 



i=i j=i 



d9i ' ' d9m 
dpi-, 9)^^ dpi-, 9) 



} 



(7) 



d9, 



-)] 



Example 2.3 (The Gaussian family and the direct metric in canonical parame- 
ters) . We may consider the metric for the Gaussian family of densities introduced earlier. 
The metric is 



hi9) 



1 V2 



8v/=^ 



01 



1 



+ 



-62 4 (-62) ' 6»2 

and, as expected, it is different from the Fisher metric seen earlier. 

Example 2.4 (The Gaussian family and the direct metric in expectation param- 
eters). We may consider the metric for the Gaussian family in the coordinates fi,v. The 
LF' metric is 



hin,v) 



W\/VTC 



1 




4v 



and, as expected, it is different from the fi, v Fisher metric seen earlier, although it is still a 
diagonal matrix. 
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3 Exponential families and Mixture families 

Earlier research in |9], [10], [6] and [7] illustrated in detail how the Hellinger distance and the 
related Fisher information metric are ideal tools when using the projection onto exponential 
families of densities. This idea was first sketched by Hanzon in [18]. The above references 
illustrate this by applying the above framework to the infinite dimensional stochastic PDE 
describing the optimal solution of the nonlinear filtering problem. This generates an approx- 
imate filter that is locally the closest filter in Fisher metric to the optimal one. The use of 
exponential families allows the correction step in the filtering algorithm to become exact, so 
that only the prediction step is approximated. Furthermore, and independently from the fil- 
tering application, exponential families and the Fisher metric are known to interact well. For 
example, the Fisher metric is obtained by double differentiation of the normalizing exponent 
in the exponential family and has a straightforward link with the expectation parameters. See 
for example [5]. 

The study of the projection filter for exponential families has been carried out in details int 
he above references, especially [13], [H] and [TU] . 

However, besides exponential families, there is another general framework that is powerful 
in modeling probability densities, and this is the mixture family. Mixture distributions are 
ubiquitous in statistics and may account for important stylized features such as skewness, 
multi-modality and fat tails. 

We define a mixture family as follows. Suppose we are given m + 1 fixed squared integrable 
probability densities in H2, say q = [gi, • • • , Q'm+i]^- Suppose we define the following space 
of probability densities: 

S^'iq) = {OiQi + 0292 + ■ ■ ■ + OrnQn, + (1 " ^1 0m)qm+l, 0^ > for alH, Oi + ■ ■ ■ + 6,^ < 1} 

For convenience, define the transformation 

9{9) := [^1, 02, ■ ■ ■ , 9m, I — Oi — 02 — ... — 6.m]'^ 

for all 6. We will often write 6 instead of 6 {6) for brevity. With this definition, 

S'''{q) = {9{9fq, > for alH, + ■ ■ ■ + < 1} 

We will occasionally refer to this manifold of densities as to the " Simple Mixture" family. While 
for exponential families the Hellinger distance and the related Fisher metric are ideal, given 
also the expression (jlj), for mixture families it is less than ideal. For example, the calculation 
of the Fisher information matrix g[6) becomes cumbersome, and the related projection is quite 
convoluted. Instead, if we consider the distance and the related structure, the metric h{9) 
and the related projection become very simple. Indeed, one can immediately check from the 
definition of h that for the mixture family we have 

~d9~ " ~ 

and 

hij{0) = J (qiix) - qmix)){qj{x) - qm{x))d\{x) =: hij 

i.e., the metric (and matrix) does not depend on the specific point 6 of the manifold. The 
same holds for the tangent space at p{-,9), which is given by 

Lp(-,e)S = spanjgi — gm+i, q2 — qm+i, ■ ■ ■ ,qm — qm+i} 
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Also the projection becomes particularly simple: 

ne : ^^(A) — > span{gi - q^+i, q2 - q-m+i, ■■■,qm- Qm+i} 

mm (8) 

i=i j=i 

It is therefore worthwhile to try and apply the metric and the related structure to the 
projection of the infinite dimensional filter onto the mixture family above. 



4 The nonlinear filtering problem 

In order to present the key geometric ideas without being overwhelmed by technicalities on 
stochastic PDEs, we consider the filtering problem with continuous time state and discrete time 
observations, and in this setup we take a scalar system. We will consider multi-dimensional 
systems later on, in the case with continuous time observations. 

In this model, the state process is a continuous time stochastic differential equation 

dXt = ftiXt) dt + at{Xt) dWt , 

but only discrete-time observations are available 

z„ = h{XtJ + K 



' n ; 



at times = to < < " " " < < ■ ■ ■ regularly sampled, where {Vn , n > 0} is a. Gaussian white 
noise sequence independent of {Xt , t > 0}. 

The nonlinear filtering problem consists in finding the conditional density Pn{x) of the state 
Xt„ given the observations up to time tn, i.e. such that P[Xt„ G dx \ Zn] = Pn{x)dx, where 
Zn := a{ZQ, ■ ■ ■ , Zn). We define also the prediction conditional density p~{x)dx = P[Xt^ G 
dx I Zn-i\- The sequence {p„ , n > 0} satisfies a recurrent equation, and the transition from 
Pn-i to Pn is decomposed in two steps, as explained for example in [22J- 

There is first a prediction step: Between time and t„, we solve the Fokker-Planck 
equation 

^ = A>r, C_,=Pn-l (9) 

where the forward diffusion operator is defined as 

while its dual backwards diffusion operator is defined as 

(9 (9^ 



dx ^ dx"^ 

The solution at final time t„ defines the prediction conditional density p~ = p"^ . 
We have then a second step, the correction step: 

At time in, the newly arrived observation Z„ is combined with the prediction conditional 
density p~ via the Bayes rule 

Pn{x) = On ^n(x) P~{x) , (10) 

where c„ is a normalizing constant, and ^^^(a;) denotes the likelihood function for the estimation 
of Xt„ based on the observation Z„, only, i.e. 



^„(a;):=exp{-l|Z„-/i(x)|2} . (11) 
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5 The mixture projection filter (MPF) 

We now introduce the mixture projection filter. 

We will now work on the prediction step first, in order to derive the projected version of the 
Fokker Planck equation, living in the manifold S**^. We adopt the following technique. Take a 
curve in the mixture family S'^ , 

t^p{-,e{t)) 

and notice that the left hand side of the Fokker Planck equation for this density would read 
and project the right hand side of the Fokker Planck equation as 

m m 

Ug[C;p{; 9)] = EE {CM; e), - q^+,)] (g, - g^+i) = 
i=i j=i 

m m 

= EE h'' {pi; e), CM, - wi))] (g. - g^^+i) 
i=i j=i 

where we used integration by parts in the last step. Now equating the two sides we obtain 

m ^ mm 

E(9i - <lm+l)-T.^i{t) = EE ^Mj - Im+l))] {li - Qm+l) 

i=\ "'^ i=l j=l 

which yields the ordinary differential equation for the parameters 6 of the projected density: 

T m 

Now, by taking into account the structure of p{;6) and the fact that such densities are linear 
in ^, we see that the above equation is a linear differential equation: 



d ™ 

dt ' f-^ 



E Ok{qk, CtiQj - Qm+l)) + (1 - 6^1 Orn){qm+l, Ct{qj - qm+l)) 

.k=l 



If we define, for two vector functions / and g, the matrix {f,g) and the vector Ctf as 

then we can write the above ODE in compact form as 

= h-\ Ct{q^,^ - Imqm+i), q ) Hm) (12) 

where q^^ is the vector with the first m components of q, and Im is a m-dimensional (column) 
vector of ones. 

In [9j and [lOj it is shown that, by carefully choosing the exponential family, the Fisher metric 
exponential projection filter makes the correction step exact. In the mixture framework under 
the metric we are using now, this is harder to achieve unless we are willing to redefine the 
manifold at every correction step. Let us therefore focus on the correction step first. Suppose 
we are in [tn-i,tn) and we obtained a prediction for the density up to t~, whose parameter 
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we call 6~. At t„ a new observation Zn arrives and we update the density. Substituting the 
prediction p{-,9^) into formula ( ITOj) . we observe that the resulting density leaves the original 
mixture family S^'^ (q). The updated density at in is 

Cn'^nix)p{x,9-) = c„$„,(x) 9 q 

and is outside S'^^(g). However, we may keep the update step exact by re-defining the basis 
functions q as follows. 

Suppose that we change basis functions at every discrete date observation step. The first 
basis function vector is then at update time ti we will select a new vector of basis functions 
q^, and so on. At every point in time we keep the vector m + 1 dimensional. Suppose the basis 
functions in [tn-i, tn) are We run the prediction step up to t~, getting 9~. At time t„, we 
define the new basis functions as 

q'iix) := Ci^n'^n{x)qi~'^{x) for alH = 1, . . . , m + 1 

and where Cj^„ is the normalizing constant for the density on the right hand side. Every g" is 
a normalized densities and we can define a mixture of such densities as the new space. In this 
case, the correction step amounts to set, at t„: 

Correction Step: 

At tn-. 9^= and the new manifold is S^ig"") 
We may now focus on the prediction step. 

Before doing so, it is important to notice that the metric changes as well when we change 
the manifold, so that it is safe to index as follows: 

= J - q'^ix'mix) - ql{x))d\{x) 
Prediction step Between time tn-i and t„, we solve the ODE's 

The solution at final time tn defines the prediction parameters 9_^ = 9^^ . 

6 Relationship with Galerkin methods 

Consider the prediction step in the above section. This is the same step we would have obtained 
through Galerkin methods, see for example [3j. 

In [3], the Galerkin method is applied to the filtering problem with continuous time obser- 
vations. We will address the continuous time observations setup in the next section. 

Here we keep discrete time observations and we show the Galerkin approximation on the 
prediction step. 

The Galerkin approximation is obtained by approximating the exact solution of the Fokker- 
Planck equation with a function of the form 



m+1 

Pt{x) := J2 Ci{t)(t)i{x), 

i=l 



(13) 
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see for example [3] for more details. The method works by replacing the exact solution of the 
Fokker-Planck equation with the solution of the equations 

for a suitable family of smooth test functions ^. By using the approximation (fT3|) in this 
last expression, and by taking ^ = (pj for j = l,...,m + l, and finally by setting 

Ci{t) = Oiit) and = qi{x) - Qm+iix) for i = 1, . . . , m, c„+i(t) = 1, = qm+i{x) 

we can see that the method provides exactly Equation (|T2l) . Therefore, for simple mixture 
families the projection filter prediction step will coincide with the Galerkin method based 
prediction step. 

However, this holds only for the case where the manifold S on which we project is the simple 
mixture family. More complex families, such as the ones we will hint at in the continuous 
observation case, will not allow for a Galerkin-based filter and only the projection filter 
can be defined there. Furthermore, even under the simple mixture family, in the continuous 
observations case there is a further fundamental difference. Our projection filter in the 
continuos time observations case will be different from the Galerkin projection filter in 
because we use Stratonovich calculus to project the Kushner-Stratonovich equation in metric. 
In [5] the Ito version of the Kushner-Stratonovich Equation is used instead, but since Ito calculus 
does not work on manifolds, due to the second order term moving the dynamics out of the 
tangent space (see for example |6j), we use the Stratonovich version instead. The Ito-based and 
Stratonovich based Galerkin projection filters will therefore differ for simple mixture families, 
and again, only the second one can be defined for manifolds of densities beyond the simplest 
mixture family. A particularly important manifold for which only the based filter can be 
defined is a manifold that makes the correction step exact also in continuous time. For such a 
family one can define a rigorous measure of the filtering error in norm, which is impossible 
to obtain with the standard Galerkin method. This will be made explicit in future work. 



7 The Filtering Problem with continuous-time observa- 
tions 

In the above part of the paper we decided to take discrete time observations in order to limit 
technicalities. In this section we consider a continuos time framework both for the observations 
Y and for the signal X, and we allow both to be multi-dimensional processes. 

dXt = ft{Xt)dt + at{Xt)dWt, Xo, 

(14) 

dYt = bt{Xt)dt + dVt, Yo = 0. 

These equations are Ito stochastic differential equations (SDE's). In the continuous observations 
case we shall use both Ito SDE's (for example for the signal X) and Stratonovich (Str) SDE's 
(when dealing with manifolds and projections). The Str form will be denoted by the presence 
of the symbol 'o' in between the diffusion coefficient and the Brownian motion of a SDE. The 
use of Str SDE's is necessary in order to be able to deal with stochastic calculus on manifolds, 
since in general one does not know how to interpret the second order terms arising in Ito's 
calculus in terms of manifold structures. The interested reader is referred to |17j . 

In (fn|) . the unobserved state process {Xt , t > 0} and the observation process {Yt , t > 0} 
are taking values in R" and R"^ respectively, the noise processes {Wt , t > 0} and {Vt , t > 0} 
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are two Brownian motions, taking values in and R*^ respectively, with covariance matrices 
Qt and Rt respectively. We assume that Rt is invertible for all t > 0, which implies that, 
without loss of generality, we can assume that Rt = I for all t > 0. Finally, the initial state 
Xq and the noise processes {Wt , t > 0} and {Vt , t > 0} are assumed to be independent. We 
assume that the initial state Xq has a density po w.r.t. the Lebesgue measure A on R", and 
has finite moments of any order, and we make the following assumptions on the coefficients ft, 
at := <7t Qt o^f , and ht of the system f|Ti|) 

(A) Local Lipschitz continuity : for all -R > 0, there exists Kji > such that 

\ft{x) - ft{x')\ < Kn\x - x'\ and \\at{x) - at{x')\\ < Kr\x - x'\ , 
for all t > 0, and for all x, x' G B^, the ball of radius R. 

(B) Non-explosion : there exists > such that 

x^ ft{x) < (1 + and trace at{x) < (1 + , 

for all t > 0, and for all x G R". 

(C) Polynomial growth : there exist K > Q and r > such that 

\ht{x)\<K{l + \xY) , 

for all t > 0, and for all x G R". 

Under assumptions (A) and (B), there exists a unique solution {Xj , t > 0} to the state 
equation, see for example |25j, and Xt has finite moments of any order. Under the additional 
assumption (C) the following finite energy condition holds 

E r |6t(Xt)p dt < oo , for all T > 0. 
Jo 

The nonlinear filtering problem consists in finding the conditional probability distribution 
Tit of the state Xt given the observations up to time t, i.e. TTt{dx) := P[Xt G dx \ yt], where 
yt := cr{Ys , < s < t). Since the finite energy condition holds, it follows from Fujisaki, 
Kallianpur and Kunita that {vr^ , t > 0} satisfies the Kushner-Stratonovich equation, i.e. 
for any smooth and compactly supported test function defined on R" 

7it{^) = 7ro(0) + /* n,{Cs<P) ds + Y, f\'^s{b': 4>) - nM) [dY,^ - ^s{b':) ds] , (15) 

Jo ^^^Jo 

where for all t > 0, the backward diffusion operator Ct is defined by 

The Str form of equation (fTSl) is obtained, after straightforward computations, as : 

T^M) = 1To{4>)+ I T^s{C-s4>)ds-\ I [Tlsi\hs? (t^) - T^si\hs?)T^s{.4')]ds 

Jo Jo 

, , (16) 
+ E f[ns{h^s<P)-^s{h^)7ism-dY^ . 
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From now on we proceed formally, and we assume that for all t > 0, the probability distribution 
Tit has a density pt w.r.t. the Lebesgue measure on R". Then {pt , t > 0} satisfies the Ito-type 
stochastic partial differential equation (SPDE) 

dpt = C; Ptdt + Y. Pt [h't - E,M]] - E,M] dt] (17) 

k=l 

in a suitable functional space, where Ep^{-} denotes the expectation w.r.t. the probability 
density pt-, i.e. the conditional expectation given the observations up to time t, and where for 
alH > 0, the forward diffusion operator CI is defined by 

for any test function (p defined on R"^. The corresponding Str form of the SPDE fll7p is : 



dpt = Clpt dt - \pt [\h\' - E,^{\h\']] dt + Y.Pt [h\ - E,M]] ° dY,' 



k=l 

In order to simplify notation, we introduce the following definitions : 

7?(P) := \[\ht?-E,{m]p, 

lt{p) ■= \bt - Ep{h''t}]p , 
for k = 1, - ■ ■ ,d. The Str form of the Kushner-Stratonovich equation reads now 



(18) 



dpt = CI Pt dt - 7°(pt) dt + Y^ j^ipt) o dY,' . (19) 

k=l 

This equation can be projected according to the L2 direct metric we introduced above, 
similarly to how we projected the Fokker Planck equation for the prediction step in the discrete 
time observation case. There the projection transformed a PDE into a ODE, whereas in our 
current case the projection will transform a SPDE into a SDE. 

Take again a curve in the mixture family S^^ , 

t^p{.,9{t)) 

and notice that the left hand side of the Kushner-Strantonovich SPDE for this density would 
read 

and project the right hand side terms of the Kushner-Strantonovich SPDE (fTOjl as 

m m 

n,[A>(-, e)] = h^'^ {pi; 9), Ctiqj - g™+i))] iq^ - q^m+i), 

i=l j=l 

m m 

MlM: 0))] = EE (itiPi; 0)), q, - qm+i)] (g. - qm+i) 
i=i j=i 

Now equating the two sides we obtain 

m m r m f 

- Qm+i)dt9,it) = E E (P(-, 0), Ct{q, - qm+i))dt - (7?(p(-, ^)), q, - qm+i)dt 

1=1 i=l ^ j=l ^ 
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d 

k=i ^ ^ 

which yields the stochastic differential equation for the parameters 9 of the projected density: 

m f 

dtO.it) = E {pi; 0), ^t{qj - qm+i))dt - (7?(p(-, 9)), q, - qm+i)dt 

3=1 ^ 

+ j:{lM:0)),q,-q^+i)odY,^ 

k=l 

Similarly to what we did for the discrete time observations case, we can write this SDE in more 
compact form as 

dtm = h-\Ct{q^,^ - IrnQm+i), g) m{t))dt ' h-\j^ {p{- , 9)) , g^,^ - 

+ E(7f 0)),q^,^ - l^g„+i) o dV,' (20) 
fc=i 

Notice that now only the prediction dt part is linear. More generally, by inspection one can 
see that the equation is quadratic. One can define a projection residual in norm, measuring 
the local projection error of the filter. This residual can be made rigorous under a specific 
mixture family incorporating a pseudo likelihood ratio update factor into each mixture family 
member function q. This, and a numerical investigation on the effectiveness of the filter for 
some standard systems is under investigation in [8]. 



8 Conclusion and Further Research 

We introduced a projection method for the space of probability distributions based on the 
differential geometric approach to statistics. This method makes use of a direct metric as 
opposed to the usual Hellinger distance and the related Fisher Information metric. We applied 
this apparatus to the nonlinear filtering problem. Past projection filters concentrated on the 
Fisher metric and the exponential families that made the filter correction step exact. Instead, 
in this work we introduce the mixture projection filter, namely the projection filter based on 
the direct metric and based on a manifold given by a mixture of pre-assigned densities. 
We derived the filter equations for the discrete time observation case first. We showed how 
an update on the manifold functions, even when keeping the same dimension, can make the 
correction step exact. A key result is that the prediction step is a simple linear ordinary 
differential equation. 

We then derived the continuous time observations filter by projecting the Kushner Stratonovich 
stochastic PDF in Stratonovich form, and obtained a SDF whose drift is linear but with addi- 
tional quadratic terms both in the drift and in the diffusion part. 

We finally remarked that the exponential projection filter had a clear relationship with the 
assumed density filters, as documented in ^0]. The mixture projection filter introduced here 
has a clear relationship with earlier Galerkin-based approaches when applied to simple mixture 
families, although even for such families there are important differences in the continuous time 
observations case. In future work [8] we will also implement the mixture projection filter 
equations numerically and will investigate the choice of the specific mixture family and the 
projection error. 
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